Data Partitioning for Incremental Data Mining
نویسندگان
چکیده
Data repositories of interest in data mining applications can be very large. Many of the existing learning algorithms do not scale up to extremely large data set. One approach to deal with this problem is to apply the concept of incremental learning. However, incremental data mining is not the same as incremental machine learning. The former handles one subset of data at a time, whereas the latter handles a single data instance at a time. The size of data subset determines both the performance and speed of the mining process. We thus focus the study on the partitioning of a data into a proper subset and propose an algorithm to return a data subset for both classification and association mining tasks. We also perform a set of experiments to observe the behavior of classification and association data mining on various data partitioning. The experimental results confirm our criteria on data partitioning.
منابع مشابه
Predicting Implantation Outcome of In Vitro Fertilization and Intracytoplasmic Sperm Injection Using Data Mining Techniques
Objective The main purpose of this article is to choose the best predictive model for IVF/ICSI classification and to calculate the probability of IVF/ICSI success for each couple using Artificial intelligence. Also, we aimed to find the most effective factors for prediction of ART success in infertile couples. MaterialsAndMethods In this cross-sectional study, the data of 486 patients are colle...
متن کاملAnalytical Comparison of Some Traditional Partitioning based and Incremental Partitioning based Clustering Methods
Data clustering is a highly valuable field of computational statistics and data mining. Data clustering can be considered as the most important unsupervised learning technique as it deals with finding a structure in a collection of unlabeled data. A Clustering is division of data into similar objects. A major difficulty in the design of data clustering algorithms is that, in majority of applica...
متن کاملD*: A Data Storage and Retrieval System for Scientific Studies
D* is a novel system for data storage and retrieval appropriate for advanced scientific studies, as in high-energy physics, environmental sciences, and astronomy. The design of the D* system is based on certain principles of organizing and accessing multi-dimensional data on storage, whose pursuit requires that the storage system acquire a greater knowledge about the data. This provides a basis...
متن کاملStatic Analysis of Software Systems
This research addresses the design and development of an incremental software architecture recovery and evaluation environment using data mining techniques. The environment is interactive and provides: pattern-based architectural recovery using a query language and approximate graph pattern matching; optimization clustering; partitioning; and view-based architectural design evaluation. These te...
متن کاملMulti-Output Adaptive Neuro-Fuzzy Inference System for Prediction of Dissolved Metal Levels in Acid Rock Drainage: a Case Study
Pyrite oxidation, Acid Rock Drainage (ARD) generation, and associated release and transport of toxic metals are a major environmental concern for the mining industry. Estimation of the metal loading in ARD is a major task in developing an appropriate remediation strategy. In this study, an expert system, the Multi-Output Adaptive Neuro-Fuzzy Inference System (MANFIS), was used for estimation of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003